All you need is superword-level parallelism: systematic control-flow vectorization with SLP

Authors:
Yishen Chen

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Charith Mendis

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationJune 2022Pages 301–315https://doi.org/10.1145/3519939.3523701

Published:09 June 2022Publication History

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pages 301–315

ABSTRACT

Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow.

In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization).

Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM’s vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.

References

2022. Auto-Vectorization in GCC. https://gcc.gnu.org/projects/tree-ssa/vectorization.htmlGoogle Scholar
2022. Auto-Vectorization in LLVM. https://llvm.org/docs/Vectorizers.htmlGoogle Scholar
2022. llvm::TargetTransformInfo Class Reference. https://llvm.org/doxygen/classllvm_1_1TargetTransformInfo.htmlGoogle Scholar
Randy Allen and Ken Kennedy. 1987. Automatic Translation of FORTRAN Programs to Vector Form. ACM Transactions on Programming Languages and Systems.Google Scholar
Randy Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Symposium on Principles of Programming Languages.Google ScholarDigital Library
Sara S. Baghsorkhi, Nalini Vasudevan, and Youfeng Wu. 2016. FlexVec: Auto-vectorization for Irregular Loops. In Programming Language Design and Implementation.Google Scholar
Bob Blainey, Christopher Barton, and José Nelson Amaral. 2002. Removing impediments to loop fusion through code transformations. In International Workshop on Languages and Compilers for Parallel Computing.Google Scholar
David Callahan, Jack J Dongarra, and David Levine. 1988. Vectorizing Compilers: A Test Suite and Results. In ACM/IEEE Conference on Supercomputing.Google Scholar
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Transactions on Programming Languages and Systems.Google Scholar
Tobias Grosser, Armin Größ linger, and Christian Lengauer. 2012. Polly – Performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters.Google Scholar
Khronos Group. 2009. OpenCL 1.0 Specification. http://khronos.org/registry/cl/specs/opencl-1.0.pdfGoogle Scholar
Ralf Karrenberg and Sebastian Hack. 2011. Whole Function Vectorization. In International Symposium on Code Generation and Optimization.Google Scholar
Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In International Workshop on Languages and Compilers for Parallel Computing. 301–320.Google Scholar
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Programming Language Design and Implementation.Google Scholar
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization.Google Scholar
Jun Liu, Yuanrui Zhang, Ohyoung Jang, Wei Ding, and Mahmut Kandemir. 2012. A Compiler Framework for Extracting Superword Level Parallelism. In Programming Language Design and Implementation.Google Scholar
Charith Mendis and Saman Amarasinghe. 2018. goSLP: Globally Optimized Superword Level Parallelism Framework. Proceedings of the ACM on Programming Languages.Google ScholarDigital Library
Simon Moll and Sebastian Hack. 2018. Partial Control-Flow Linearization. In Programming Language Design and Implementation.Google Scholar
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of Interleaved Data for SIMD. In Programming Language Design and Implementation.Google Scholar
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop Vectorization: Revisited for Short SIMD Architectures. In International Conference on Parallel Architectures and Compilation Techniques.Google ScholarDigital Library
Karl J. Ottenstein, Robert A. Ballance, and Arthur B. MacCabe. 1990. The Program Dependence Web: A Representation Supporting Control-, Data-, and Demand-Driven Interpretation of Imperative Languages. In Programming Language Design and Implementation.Google Scholar
Joseph CH Park and Mike Schlansker. 1991. On predicated execution.Google Scholar
Matt Pharr and William R. Mark. 2012. ispc: A SPMD Compiler for High-Performance CPU Programming. In Innovative Parallel Computing.Google Scholar
Vasileios Porpodas and Timothy M. Jones. 2015. Throttling Automatic Vectorization: When Less is More. In Conference on Parallel Architecture and Compilation.Google Scholar
Vasileios Porpodas, Alberto Magni, and Timothy M. Jones. 2015. PSLP: Padded SLP Automatic Vectorization. In International Symposium on Code Generation and Optimization.Google Scholar
Vasileios Porpodas, Rodrigo CO Rocha, and Luís FW Góes. 2018. VW-SLP: auto-vectorization with adaptive vector width. In International Conference on Parallel Architectures and Compilation Techniques.Google ScholarDigital Library
Vasileios Porpodas, Rodrigo C. O. Rocha, Evgueni Brevnov, Luís F. W. Góes, and Timothy Mattson. 2019. Super-Node SLP: Optimized Vectorization for Code Sequences Containing Operators and Their Inverse Elements. In International Symposium on Code Generation and Optimization.Google Scholar
Louis-Noël Pouchet. 2021. PolyBench/C: the polyhedral benchmark suite. https://web.cse.ohio-state.edu/ pouchet.2/software/polybench/Google Scholar
Rodrigo C. O. Rocha, Vasileios Porpodas, Pavlos Petoumenos, Luís F. W. Góes, Zheng Wang, Murray Cole, and Hugh Leather. 2020. Vectorization-Aware Loop Unrolling with Seed Forwarding. In International Conference on Compiler Construction.Google Scholar
Ira Rosen, Dorit Nuzman, and Ayal Zaks. 2007. Loop-aware SLP in GCC. In GCC Developers Summit.Google Scholar
Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-Level Parallelism in the Presence of Control Flow. In International Symposium on Code Generation and Optimization.Google Scholar
Jean-Baptiste Tristan, Paul Govereau, and Greg Morrisett. 2011. Evaluating Value-Graph Translation Validation for LLVM. In Programming Language Design and Implementation.Google Scholar
Peng Tu and David Padua. 1995. Efficient Building and Placing of Gating Functions. In Programming Language Design and Implementation.Google Scholar

Index Terms

All you need is superword-level parallelism: systematic control-flow vectorization with SLP
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

VeGen: a vectorizer generator for SIMD and beyond
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Vector instructions are ubiquitous in modern processors. Traditional compiler auto-vectorization techniques have focused on targeting single instruction multiple data (SIMD) instructions. However, these auto-vectorization techniques are not sufficiently ...
Read More
Exploiting superword level parallelism with multimedia instruction sets

Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, ...
Read More
Vectorization-aware loop unrolling with seed forwarding
CC 2020: Proceedings of the 29th International Conference on Compiler Construction

Loop unrolling is a widely adopted loop transformation, commonly used for enabling subsequent optimizations. Straight-line-code vectorization (SLP) is an optimization that benefits from unrolling. SLP converts isomorphic instruction sequences into ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2022
1038 pages
ISBN:9781450392655
DOI:10.1145/3519939
General Chair:
Ranjit Jhala
University of California at San Diego, USA
,
Program Chair:
Işil Dillig
University of Texas at Austin, USA
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
- Artifacts Available / v1.1
Author Tags
auto-vectorization
optimization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 1,535
  Total Downloads
- Downloads (Last 12 months)779
- Downloads (Last 6 weeks)103
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

All you need is superword-level parallelism: systematic control-flow vectorization with SLP

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

VeGen: a vectorizer generator for SIMD and beyond

Exploiting superword level parallelism with multimedia instruction sets

Vectorization-aware loop unrolling with seed forwarding